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Abstract — Consider data transmission over a binary-input 
additive white Gaussian noise channel using a binary low- 
density parity-check code. We ask the following question: Given 
a decoder that takes log-likelihood ratios as input, does it help 
to modify the log-likelihood ratios before decoding? If we use 
an optimal decoder then it is clear that modifying the log- 
likelihoods cannot possibly help the decoder's performance, and 
so the answer is "no." However, for a suboptimal decoder like the 
linear programming decoder, the answer might be "yes": In this 
paper we prove that for certain interesting classes of low-density 
parity-check codes and large enough SNRs, it is advantageous 
to truncate the log-likelihood ratios before passing them to the 
linear programming decoder. 

I. Introduction 

While maximum-likelihood (ML) decoding of low-density 
parity-check (LDPC) codes is reasonably well understood 
based on the expected weight distribution of the codes, the 
linear programming (LP) and the related belief propagation 
(BP) decoding of LDPC codes reveal a number of interesting 
and unexpected phenomena. The root cause of the differ- 
ence between these suboptimal decoders and ML decoding 
is the occurrence of so called pseudo-codewords; from the 
perspective of an LP or BP decoder, the pseudo-codewords 
act as attractive solutions to the decoding problem, even 
though they are not actual codewords in the LDPC code under 
consideration. In contrast to codewords which, for codes of 
length n and under antipodal signaling, map to elements of 
the set {+1, —1}", pseudo-codewords are vectors of length n 
that map to vectors with entries that lie in the interval [— 1, +1], 
Note that the set of possible pseudo-codewords is a function 
not only of the code but also of the chosen parity-check matrix. 

This paper explores one of the above-mentioned unexpected 
phenomena of LP decoding and discusses the roots of this 
behavior. Considering the tight relationship between LP decod- 
ing and iterative decoding [1], [2], [3], [4], our observations 
about LP decoding must also have consequences for iterative 
decoding. Before we start describing that phenomenon, let us 
first explain the communication setup (see Fig.[Q that is under 
consideration. 

• We use a binary channel code of length n, dimension k, 
and rate k/n. 




Fig. I. Communication setup under consideration. (See main text for 
explanations.) 

• The information word u G {0, 1} is encoded into the 
codeword x 6 {0, 1}". We assume that all information 
words are chosen with equal likelihood. 

• Let 9 : K — > K, u>i i— > 1 — 2uii. Restricting the domain of 
8 to {0, 1} we obtain the usual BPSK mapping: 0^+1 
and 1 1-4 -1, When applying the map 9 to a vector we 
define the result to be a vector where each component 
is mapped according to 9. Instead of 6(u>i) and 9(u>) we 
will very often simply write u>i and <I>, respectively. For 
our communication setup this means that the codeword 
x G {0, 1}™ is mapped to its signal-space point x = 
0(x) = (9(x 1 ),...,8{x n )) G {+1,-1}" 

• For i = 1, . . . , n, the symbols Xi are sent over a (binary- 
input) additive white Gaussian noise channel (AWGNC) 
with noise power Nq/2, i.e. we receive Yj = Xi + Z, 
where {^}" =1 are i.i.d. random variables with Zi ~ 
Af(0, No/2). Here, Af(fi, er 2 ) denotes a Gaussian random 
variable with mean [i and variance a 1 . 

• Based on the observations Yj, = y,-, i = 1, . . . , n, we 
compute the normalized log-likelihood ratios (LLRs) 

. a , fP9i\X i (Vi\+ 1 )\ , f PY i \X i (Vi\0)\ 
Ai = ?7 ■ log — — — = 7? • log — —- , 

\Pn\XiiVi\ \PYi\XiiVi\ 1 ) J 

where the normalization constant r\ = rj(No) is chosen 
such that Xi equals +1 if Zj = 0. 

• A mapping fi : R — > M is applied to the LLRs and results 
in the modified LLRs A^ = /i(Aj), i = 1, . . . , n. 

• Based on the modified LLR vector A', a decoder cf> tries 
to make a decision x = <j){\') about x. (Or, alternatively, 
tries to decide on u or x.) 

« When decoding a code of length n, we use the label 
P%(n) for denoting the block error probability of a 
decoder <j) which bases its decisions on the modified LLR 
vector A' = /i(A). 



Let C = 9(C) be the set of points in signal space that 
correspond to the codewords. Using the (normalized) LLR 
vector A, the maximum likelihood (ML) decoder (/>ml can 
be cast as 



!>ml(A') = argmax 



i=l 



i A; i 



(1) 




with the trivial mapping A^ = /itriv(Ai) = A^, i = l,...,n. 
From this expression it is clear the the LLR vector A is a 
sufficient statistic for optimal decoding. Moreover, using the 
data-processing inequality (see e.g. [5]) it can easily be shown 
that there is no mapping /i such that for a given code of length 
n there is a decoder <j) such that P$(n) < P^(n). 

The situation is not as simple in the case of suboptimal 
decoders, e.g. the linear programming (LP) decoder [3], [4]. 
In fact, combining the results in [6] and [1], we show that for 
certain low-density parity-check (LDPC) codes and for high 
enough SNR it is favorable not to use the trivial map /itriv, 
but to use a two-level quantization map 



K — MQ2,l(A;) 



before performing the LP decoding. 

This seeming paradox is not uncommon for suboptimal 
algorithms. We cite the following paragraph from Ganti et 
al. [7, p. 2316] which remarks on a similar phenomenon (albeit 
in a different context): "[. . . ] Indeed, in the matched case 
it is clear that the optimal decoder for the general channel 
performs at least as well as a decoder that first quantizes the 
output and then performs optimal processing on the quantized 
samples. Under mismatched decoding, however, it is unclear 
how to relate the performance of the mismatched decoder on 
the original channel to its performance on the output-quantized 
channel." 

A natural question arises: Is the advantage of using the two- 
level quantization map the result of a quantization effect, or 
something else? We show that there are code families such 
that for any finite W, the thresholding map 



-W if Xi > +W 
A- = mt,w(A 1 ) = { -W if A, < -W 



(2) 



A, 



otherwise 



is also favorable to the trivial map /itriv This suggests that the 
asymptotic advantage over /itriv is gained not by quantization, 
but rather by restricting the LLRs to have finite support. 

The rest of the paper is structured as follows. We will give 
a brief introduction to LP decoding and pseudo-codewords 
in Seem 1 In Sec.|lll| we will talk about pseudo-codewords 
stemming from the canonical completion and their importance 
for the asymptotic behavior of the LP decoder. In Sees. IIVI 
and |VJ we will discuss the main results of this paper, namely 
we show examples when thresholding and quantizing of the 
LLRs can help. 

' For recent work on the notion of pseudo-codewords in decoding we refer 
to [8], [9], [2], [1], [10], [3], [4]. 



II. LP Decoding 
ML decoding as in Q can also be formulated as 

n 

x = 0ml(A') = arg max V^A-, (3) 

xeconv(C) f— f 

i— l 

where conv(C) is the convex hull of C and where the mapping 
/i is the trivial mapping /itriv Unfortunately, for most codes 
of interest, the description complexity of conv(C) grows 
exponentially in the block length and therefore finding the 
maximum in (0 with a linear programming solver is highly 
impractical for reasonably long codes. 2 

A standard approach in optimization in order to simplify 
the problem, is to replace the maximization over conv(C) by 
a maximization over some easily describable polytope V that 
is a relaxation of conv(C): 



x = arg max ^""^ Xj A \ . 



(4) 



If V is strictly larger than conv(C) then the decision rule in 
l|4} obviously represents a sub-optimal decoder. A relaxation 
which works particularly well for LDPC codes is given by the 
following approach [3], [4]. Let C be described by an m x n 
parity-check matrix H with rows hi, ha, ... , h m . Then the 
polytopes V = V(H) and V = V(H) = 9{V), also called the 
fundamental polytopes [1], are defined as 

m 

V = P conv(Ci) with d = {x G {0, 1}™ | h^x 1 " = mod 2}, 



V = P| conv(Ci) with C 2 = 0(d). 

i-l 

Note that V is a convex set within [0, 1]™ that contains conv(C) 
but whose description complexity is much smaller than the 
description complexity of conv(C). (A similar comment ap- 
plies to V which is a convex set within [—1, +1]" and which 
contains conv(C).) Points in the set V will be called pseudo- 
codewords, and since V is a convex polytope, we may restrict 
our attention to the vertices of V (and V). Because the set 
V is usually strictly larger than conv(C), the decoding rule 
in @ might deliver a vertex of V that is not the signal-space 
equivalent of a codeword; these "fractional" vertices are the 
reason for the sub-optimality of LP decoding (cf. [4], [1]). 

For analyzing the above setup it turns out to be useful to 
define the AWGNC pseudo- weight [11] of a pseudo-codeword 
w S V to be w awgnc( w ) = 11^112/ii^u^ where 11^^ 

and 1 1 c*jf 1 1 2 are the L\- and L2-norm of us, respectively. The 
significance of w^ WGNC (u;) is the following. The existence 
of a pseudo-codeword u; = (cji,W2, ■ ■ ■ ,^n) S causes 
LP decoding to fail to detect the codeword if the vector of 
received LLRs A = (Ai, A2, . . . , A„) satisfies the inequality 

YSLl^i ' K > SLlO ' K< Whel ' e X ' = / i triv(A) = A. 

Then it can be shown that the squared Euclidean distance from 
= +1 to the plane {A' S R n \ E"=ifo ~ ®) X i = °} is 



w 



AWGNC ((jj). 
2 



Exceptions to this observation include for example the class of convolu- 
tional codes with not too many states. 



III. The Canonical Completion and 
its Implications 

Consider a (d v , c? c )-regular 3 binary code C of length n 
described by a parity-check matrix H. Its Tanner graph [12] 
will be denoted by T = T(H), where the set of variable nodes 
will be called V = V(T), the set of check nodes will be called 
C = C(T), and a node v 6 V is adjacent to a node c G C 
if and only if the corresponding entry in H equals 1. Given 
a variable node v E V, we let A„(T) denote the maximal 
(graph) distance from v that any node in T can have. Our 
goal in this section is to construct a pseudo-codeword whose 
impact on the LP decoder depends on the mapping /i. Before 
defining this pseudo-codeword, we need a definition. 

Definition 1 (1]): Let T be a Tanner graph. We denote an 
arbitrary variable node v G V(T) to be the root. We classify 
the remaining variable and check nodes according to their 
(graph) distance from the root, i.e. the root is at tier 0, all nodes 
at distance 1 from the root will be called nodes of tier 1, all 
nodes at distance 2 from the root node will be called nodes 
of tier 2, etc.. We call this ordering "breadth-first spanning 
tree ordering with root v." Because of the bipartiteness of T, 
it follows easily that the nodes of the even tiers are variable 
nodes whereas the nodes of the odd tiers are check nodes. 
Furthermore, a check node at tier 2t + l can only be connected 
to variable nodes in tier 2i and possibly to variable nodes in 
tier 2t + 2. Note that the last tier is tier A„(T) and that the 
variable nodes are at tiers 0, 2, ... , 2LA„(T)/2J. □ 

Definition 2 ( Canonical completion [1 ]): Let C be a binary 
(d v , (ic)-regular code with parity-check matrix H and Tanner 
graph T = T(H). Let v € T be an arbitrary variable node. 
After performing the breadth-first spanning tree ordering with 
root v, we construct a vector uj in the following way. If bit i 
corresponds to a variable node in tier 2i, then 



l 



It is possible to choose a scaling factor a > (in fact, a whole 
interval of a's) such that u) = a ■ u) G "P(H). We call the 
resulting pseudo-codeword ui the canonical completion with 
root v. □ 
Theorem 1 (!]): Same scenario as in Def.|2] The canonical 
completion with root v yields a vector us such that uj is in 
the fundamental polytope 'P(H). Imposing the additional mild 
constraint 3 < d v < d c , the pseudo-weight Wp WGNC (u;) of uj 
can be upper bounded by 



where 



/4 



a ( d v (d v - 1) 
d v - 2 



/3d v ,d c — 



log((d v -l) 2 



log ((d v - l)(dc - 1)) 



< 1. 



□ 



3 An LDPC code is called a (d v , d c )-regular code if the uniform column 
weight of the relevant parity-check matrix H is d v and the uniform row 
weight of H is d c . 



Assuming /i to be the trivial mapping /itriv> the above 
theorem has immediate consequences for the LP decoder: the 
LP decision region for is constrained by a hyperplane whose 
squared Euclidean distance from is at most (3' d 



Because f3d v ,d c < 1, this implies that the word error probabil- 
ity P^p ( n ) f lp decoding is lower bounded: P^* (n) > 

(1 - l/(K'n^.^))(2T:K'n^^y 1/2 exp ( - fV<Wc) 
where K' is positive and a function of the SNR, independent 
of n. This observation implies that the reliability function 
lim„^ 00 sup-ilog(P / f t 1 ;i p v (n)) of the AWGNC under LP 
decoding approaches zero for any fixed SNR. This is in stark 
contrast to ML decoding whose reliability function remains 
non-zero for large enough signal-to-noise ratios. In this context 
it is interesting to note that Lentmaier et al. [13] could prove 
that under some mild technical conditions the block error rate 
of a (d v , d c ) -regular code under belief-propagation decoding 
with a bounded number of iterations is upper bounded by 
P tree (n) < n ■ exp(— X"n' 3d v,<ic/ 4 ) for the same constant 
Pd v ,d c , where Pt iee (n) refers to the block error rate of a 
belief propagation decoding algorithm where the number of 
iterations is one quarter the girth of the Tanner graph. 

IV. Quantizing and Thresholding 

We still consider the LP decoder, but we want to investigate 
what happens when /i is selected to be something other than 
/itriv So, let us consider what happens when /i = HQ2.L is 
selected for some 4 L > 0. Actually, it can easily be seen that 
the combination of the AWGNC and this quantization gives 
(apart from scaling) the same LLR vectors as at the receiver 
end of a binary symmetric channel (BSC). Recognizing this, 
we can use the results of [6] which show that there exists fam- 
ilies of expander-based (d v , d c ) -regular LDPC codes which 
are guaranteed to correct a constant fraction r of errors on 
the BSC. By a simple union bound argument we conclude 
that for sufficiently large SNR the block error probability is 
upper bounded by Pft^ L {n) < n exp(— K"'n) where again 
K'" is positive and independent of n. It follows that there 
exist families of expander-based (d v , c? c )-regular LDPC codes 
where lim^oo sup — ^ log (P^ L (")) is strictly larger than 
zero under LP decoding, for sufficiently large SNR. 

What explains this advantage in the asymptotic behavior? 
Looking at the above results we have to consider two can- 
didates: (i) the quantized values of the modified LLRs or 
(ii) the finite support of the modified LLRs. It turns out 
that the answer is given by (ii), namely it is sufficient to 
threshold the LLRs, whereas quantization as in (i) is not 
really necessary. As is shown in the Section |VJ one can set 
/i = ht,w (see (|2}) for any finite W > 1 and construct 
classes of (d v , c? c )-regular expander-based LDPC codes where 
limn^oo sup — — log (P^ r w (n)) is non-zero under LP decod- 
ing. 5 

4 Note that the result of the LP decoder is independent of the exact choice 
of L > 0. 

5 The constraint W > 1 is not necessary, but was imposed to simplify the 
presentation; Th. |2]holds for any W > 0. 



Theorem 2: Consider the setup as described in Sec. [I] where 
we transmit over an AWGNC with noise power a 2 — Nq/2. 
For any finite truncation value W > 1, any constant rate < 
r < 1, and sufficiently small a 2 > 0, there exists a family 
of (d v , d c ) -regular Tanner graphs for low-density parity-check 
codes of increasing length, each with rate at least r, such that 
linin^oo sup — i log (P^ r w (n)) is strictly larger than zero. 
Proof: See Section El ■ 

Putting the above results for the LP decoding with the differ- 
ent mappings /i = /i tr i V and /j, = /.it.w in juxtaposition reveals 
a surprising property of LP decoding. For values of SNR where 
both the lower bound on P^f v and the upper bound on P^ r w 
are non-trivial it is actually advantageous for (certain classes 
of) long codes to threshold the LLRs before attempting to 
decode. In other words, since there is an n large enough (as a 
function of K and K" 1 ) such that n exp(— K"'n) is less than 
(l-l/(K'n' 3 ^^))(2TrK'n l3d - d -)- 1 / 2 exp(-^-n l3d - d -), op- 
erating on the thresholded versions of the LLRs will yield a 
smaller probability of error than retaining the full information 
contained in A. 6 

What does this mean for a pseudo-codeword u> associated 
with a canonical completion? Roughly speaking, the mappings 
Ht,w an d A*Q2,l bend the vector A in such a way that 
the pseudo-codeword ui is less often the result of the LP 
decoder. This bending, which for an optimal decoder can only 
deteriorate its performance, turns out to be overall helpful for a 
sub-optimal algorithm like the LP decoder, at least for certain 
interesting classes of LDPC codes and large enough SNRs. 

V. Proof of Theorem|2] 

This Section is devoted to proving Th. |2] Before we start 
going through the different steps of the proof, we introduce 
some useful notation. For an integer n, we use [n] to denote 
the set of integers from 1 to n. We use T(n, m) to denote 
a Tanner graph with n variable nodes and m check nodes. 
For such a Tanner graph, we will usually identify the set of 
variable nodes V with [n] and the set of check nodes C with 
[m]. For a set of nodes S, let N(S) denote the neighbor set 
of S. 

Definition 3: A Tanner graph T with variable node set V of 
size n, is an (an, (3)-expander if all sets S C V with |5| < an 
have \N(S)\ > f3\S\. □ 

The following proposition follows from [14] (see also [15]): 

Proposition 3: Let < r < 1, and let d v and d c be positive 
integers such that r = 1 — 4^. Then for any < S < 1 — 4-, 
and sufficiently large n, there exists a Tanner graph with n 
variable nodes, m = nd v /d c check nodes, uniform variable 
node degree d v , and uniform check degree d c , which is an 
(an, (5dv)-expander, where < a < 1 is a constant that does 
not depend on n. Moreover, a randomly constructed graph has 
these properties with high probability. □ 

For the given truncation value W in Th. |2] let d v be any 
integer greater than 4(4W + 2). Let 5 be any constant where 

6 A similar comment can be made about LP decoding with fi = fi tI i v 
vs. fi = fiQ2,L- there is an n from where on it is better to work with the 
one-bit quantized LLRs than with the original LLRs. 



(4W+2)' Now l et ^ b e me l ar g est value 
that is less than or equal to 5 such that 5d v is an integer. Note 
that S — 5 < 4-. This implies that 5 > 1 — 4V ^ +2 . 

From Prop. |5] we obtain a family of Tanner graphs; each 
graph T(n, m) has uniform variable degree d v , uniform check 
degree d c , has r = 1 — ^, and is an (an, <5d v ) -expander, for 
some constant a that does not depend on n. Fix a particular 
length n, and call C — C(n, m) the code defined by the Tanner 
graph T = T(n, m) from the family. 

Suppose the vector +1 = e C is transmitted over the 
AWGNC. Define U = {i G [n] : A' ; < 1/2}, where A' 
is defined according to Q. 7 This set represents the variable 
nodes with "high noise." For one particular i s [n], define 
p(cr 2 ) as the probability that i e U. Note that p(cr 2 ) is the 
same for all i, is a function only of the variance a 2 , and goes 
to zero as a 2 goes to zero. 

Define 7 = J^j , Note that < 7 < 1. Let a 2 

be sufficiently small so that p(cr 2 ) < 2(1+7) ' ^ a sml Pl e 
Chernoff bound we have that 



\U\< 



an 



< 



an 



1 



(5) 



2(1 + 7) -L + 7 

with probability at least 1 — 2~ S1 ("). In other words, with high 
probability, the set of nodes with high noise is "small." 
We let 5' = 25-1 and define 



U = 



i e V 



i <£ U and \N(i) n N(U)\ > (1 - 5')d x 



The set U represents the variable nodes that do not have high 
noise, but do have high connectivity to the neighbors of the 
nodes with high noise. 

We appeal to the following, which uses the same argument 
as a similar theorem in [6]: 

Theorem 4: If T is an (an, (5d v ) -expander and \U\ < ""p^ 1 

then \U\ + \U\ < an. □ 

Using (|5jl together with this theorem, we have that \U\ + 
\U\ < an with probability at least 1 - 2~ n(n) . At this point 
we will apply what we know about the expansion of the graph 
to prove that the LP decoder succeeds. We first need another 
definition and proposition from [6]: 

Definition 4 (6]): A 5-matching of U is a subset M of the 
edges incident to U' = U U U such that (i) every check node 
incident to at most one edge of M, (ii) every node in U is 
incident to at least 5d v edges of M, and (iii) every node in U 
is incident to at least <5'<i v edges of M. □ 

Proposition 5 (6]): If T is an (an, 5d v ) -expander with 5d v 
an integer, and \U\ + \ U\ < an, then U has a ^-matching. □ 

It remains to show how the existence of a ^-matching proves 
that the LP decoder will succeed. To prove that the LP decoder 
succeeds, we use the method of finding a dual witness. More 
details, as well as a general treatment of this technique, can be 
found in [6], [10]. Here, we state the definition and theorem 
relevant to this application: 

7 The value 1/2 in the definition of U was set for simplicity. The main 
theorem will go through for any W > 0, as long as this constant "1/2" is 
less than 1, greater than zero, and less than or equal to W. 



Definition 5 (6]): Given a Tanner graph T(n, m), and a 
vector of LLRs A-, a setting of weights {r^} to the edges 
(i, j) in T is feasible if (i) for all checks j E [m] and distinct 
€ we have + Tyj > 0, and (ii) for all nodes 

i G [n], we have £ jeJV(i) T y < A i- D 

Theorem 6 (6]): Under any memoryless binary-input 
output-symmetric channel, using any binary linear code, 
under the assumption that +1 = is transmitted, the LP 
decoder (using a Tanner graph T for the code) succeeds if 
and only if there exists a feasible weight assignment to the 
edges of T. 

□ 

Finally, using a line of reasoning similar to [6], we establish 
that a c> -matching is sufficient to guarantee a feasible edge 
weight assignment, and thus a proof that the LP decoder 
succeeds. Here is where we use our bound on S in terms of 
W: 

Theorem 7: If U has a <5-matching, and S > 1 — 4Vt ^ +2 , 
then there exists a feasible edge weight assignment. □ 
Proof: Given a <5-matching M, we assign weights to 
each edge in the graph as follows; we later specify the 
parameter n > 0. 

• For all j such that E M for some i E U, set r,j = 
—k, and set Tj/j = n for all i' E N(j) \ {i}. 

• For all other j, set nj = for all i E N(j). 

This weighting clearly satisfies condition (i) of a feasible 
weight assignment. For the second condition, there are three 
cases. 

1) For a variable node i E U, we have —W < A- < 1/2. 
By definition of M, at least 5d v edges incident to i have 
nj = —k. All other incident edges have E {0, k}, 
and so the total weight of edges incident to i is at most 
5d v (—K) + (1 — 5)d v K = (1 — 2S)d v n. If we maintain 

( a ) K > (2S^i)d ' tnen tms tota l we ig nt l ess tnan —W, 
which is less or equal to A-, as required. 

2) For a variable node i E U, we have A^ > 1/2. At 
least S'd v edges incident to i are in M, and therefore 
have weight 0, by the definition of M and the weight 
assignment. All other edges have weight or +k. 
Therefore the total weight of incident edges is at most 
(1 — S')d v K = 2(1 — S)d v n. If we maintain (b) n < 
4(i-s)d ' t ^ len ^ S we ight is less than 1/2, which 
is less or equal to X' i7 as required. 

3) For a variable node i £ (U U U), by definition this 
variable node has at least S'd v edges not incident to 
N(U). These edges all have weight 0, and so we get 
the same condition (b) as in the previous case. 

Combining our requirements (a) and (b) on k, we get the 
overall requirement 4 ^~^ > W, which is equivalent to our 
assumption on 8. ■ 
Putting it all together, we have shown that for an arbitrary 
truncation value W, and rate r, there is a sufficiently small 
cr 2 and a family of (d v , d c )-regular graphs on which the LP 
decoder succeeds with probability 1 — 2~°(") when +1 = 
is transmitted over an AWGNC with noise power a 2 and with 



LLR modification \i = fir,w- The assumption that +1 = is 
transmitted is without loss of generality because the polytope 
is "C-symmetric" (see [4], [3] for details). Thus we have 
shown that the word error rate of the LP decoder decreases 
exponentially. 
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